WHAT IS CLAIMED IS: 



1 . A system for providing adaptive playback of an audio signal, 
comprising: 

storing received audio data to a signal buffer; 
outputting parts of the signal present in the signal buffer as needed for 
signal playback; 

analyzing the contents of the signal buffer; 

stretching at least part of the signal present in the signal buffer when the 
analysis of the contents of the signal buffer indicates that the length of the signal 
in the signal buffer is less than a predetermined threshold; and 

compressing at least part of the signal present in the signal buffer when 
the analysis of the contents of the signal buffer indicates that the length of the 
signal in the signal buffer is greater than a predetermined threshold. 

2. The system of claim 1 wherein analyzing the contents of the signal 
buffer includes determining a type of the contents of the signal buffer from among 
a group including: periodic content, quasi-periodic content, aperiodic content and 
mixed content. 

3. The system of claim 2 wherein stretching at least part of the signal 
having any of periodic content and quasi-periodic content type comprises: 

identifying at least one of the segment of the content of the signal buffer 
as a template; 

searching for a matching segment in portions of the content of the signal 
buffer whose cross correlation peak exceeds a predetermined threshold; and 

inserting the template into the content of the signal buffer, and aligning 
and merging the matching segments. 

4. The system of claim 2 wherein stretching at least part of the signal 
having aperiodic content the type comprises automatically generating and 
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inserting at least one synthetic segment into the buffered signal to increase the 
length of the content of the signal buffer. 

5. The system of claim 4 wherein automatically generating the at least 
5 one synthetic segment comprises: 

automatically computing the FFT of the at least part of the signal; 
introducing a random rotation of the phase into the FFT coefficients; and 
computing the inverse FFT for each segment, thereby creating the at least 
one synthetic segment. 

10 

6. The system of claim 4 wherein automatically generating the at least 
one synthetic segment comprises: 

applying at least one LPC filter to the at least part of the signal to compute 
an LPC residual; 
15 computing at least one FFT from the LPC residual; 

introducing a random rotation of the phase into the coefficients of at least 
one of the computed FFTs; 

computing inverse FFTs from the FFT coefficients to reconstruct the LPC 
residual; and 

20 applying at least one inverse LPC filter to the LPC residual, thereby 

creating the at least one synthetic segment. 

7. The system of claim 1 wherein the predetermined threshold for 
stretching and compressing at least part of the signal present in the signal buffer 

25 are optimized to compensate for clock drift between an encoder and a decoder. 

8. A system for providing an adaptive playback of received frames of 
an audio signal transmitted across a packet-based network, comprising: 

receiving and decoding data frames of an audio signal transmitted across 
30 a packet-based network; 

storing the decoded data frames to a signal buffer; 
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analyzing the contents of the signal buffer; 

outputting one or more of the decoded frames present in the signal buffer 
when the analysis of the contents of the signal buffer indicates that the length of 
the signal in the signal buffer is between a predetermined minimum and a 
predetermined maximum buffer size; 

stretching and outputting one or more decoded frames in the signal buffer 
when the analysis of the contents of the signal buffer indicates that the length of 
the decoded frames in the signal buffer is less than the predetermined minimum 
buffer size; and 

compressing and outputting one or more decoded frames in the signal 
buffer when the analysis of the contents of the signal buffer indicates that the 
length of the decoded frames in the signal buffer is greater than the 
predetermined maximum buffer size. 

9. The system of claim 8 wherein any frame output from the signal 
buffer is removed from the signal buffer as it is output. 

1 0. The system of claim 8 further comprising packet loss concealment 
for signal packets declared to be late loss packets. 

1 1 . The system of claim 8 wherein stretching and outputting one or 
more decoded frames provides automatic jitter control as a function of buffer 
content. 

12. The system of claim 1 1 wherein stretching one or more decoded 
frames further comprises automatically determining a content type of the 
stretched frames prior to stretching those frames. 

1 3. The system of claim 12 wherein the content type includes any of 
voiced framed, unvoiced frames, and mixed frames. 
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1 4. The system of claim 1 3 wherein stretching any voiced frame 
comprises: 

identifying at least one of the segment of the voiced frame as a template; 
searching for a matching segment in adjacent frames whose cross 
correlation peak exceeds a predetermined threshold; and 

aligning and merging the matching segments of the frame. 

1 5. The system of claim 8 wherein stretching any unvoiced frame 
comprises automatically generating and inserting at least one synthetic segment 
into the current frame to increase a length of the current frame. 

1 6. The system of claim 1 5 wherein automatically generating the at 
least one synthetic segment comprises: 

automatically computing the FFT of the current frame; 
introducing a random rotation of the phase into the FFT coefficients; and 
computing the inverse FFT for each segment, thereby creating the at least 
one synthetic segment. 

1 7. The system of claim 1 5 wherein automatically generating the at 
least one synthetic segment comprises: 

applying at least one LPC filter to the current frame to compute an LPC 
residual; 

computing at least one FFT from the LPC residual; 
introducing a random rotation of the phase into the coefficients of at least 
one of the computed FFTs; 

computing inverse FFTs from the FFT coefficients to reconstruct the LPC 
residual; and 

applying at least one inverse LPC filter to the LPC residual, thereby 
creating the at least one synthetic segment. 
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18. The system of claim 8 wherein stretching any mixed frame 
comprises: 

identifying at least one segment of the frame as a template; 

searching for a matching segment whose cross correlation peak exceeds 
5 a predetermined threshold; 

aligning and merging the matching segments of the frame to create an 
interim voiced segment; 

automatically generating and inserting at least one synthetic segment into 
the current frame to create an interim unvoiced segment; 
10 weighting each of the interim voiced segment and the interim unvoiced 

segment relative to a normalized cross correlation peak computed for the current 
segment; and 

adding and windowing the interim voiced segment and the interim 
unvoiced segment to create a partially synthetic stretched segment. 

15 

19. The system of claim 8 wherein compressing any voiced frame 
comprises: 

identifying at least one segment of the frame as a template; 
searching for a matching segment whose cross correlation peak exceeds 
20 a predetermined threshold; 

cutting out the signal between the template and the match; and 
aligning and merging the matching segments of the frame. 

20. The system of claim 8 wherein compressing any voiced frame 
25 comprises: 

shifting a segment of the frame from a first position in the frame to a 
second position in the frame; 

deleting the portion of the frame between the first position and the second 
position; and 
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adding the shifted segment of the frame to the signal representing the 
remainder of the frame by using a sine windowing function for blending the edges 
of the segment with the signal representing the remainder of the frame. 



5 21 . The system of claim 8 wherein both the predetermined minimum 

buffer size for stretching one or more decoded frames in the signal buffer and the 
predetermined maximum buffer size for compressing one or more decoded 
frames in the signal buffer are optimized to compensate for clock drift between 
an encoder and a decoder. 

10 

22. A method for adaptive playback of received frames of an audio 
signal transmitted across a packet-based network, comprising using a computing 
device to: 

receive a packetized audio signal broadcast across a packet-based 
15 network; 

decode each received packet and store the resulting decoded signal 
frame in a signal buffer; 

output a current packet in the case where the current packet has been 
received across the packet-based network; 
20 instantiate a mute mode whereby a playback of the audio signal is at least 

partially muted when a maximum delay time for receiving the current packet has 
been exceeded, and the current packet has not been received; 

instantiate a packet loss concealment mode whereby the playback of the 
audio signal is modified for reducing audible artifacts resulting from one or more 
25 lost packets when a current buffer content has been previously temporally 
stretched, the current packet has not yet been received, and a packet 
subsequent to the current packet has already been received. 



23. The method of claim 22 further comprising analyzing the content of 
30 the signal buffer for determining a current length of the contents of the signal 
buffer. 
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24. The method of claim 23 further comprising stretching and outputting 
one or more decoded frames from the signal buffer when the current length of 
the contents of the signal buffer is less than a predetermined minimum buffer 
size. 

25. The method of claim 24 wherein the predetermined minimum buffer 
size is optimized to compensate for clock drift between an encoder and a 
decoder. 

26. The method of claim 23 further comprising compressing and 
outputting one or more decoded frames from the signal buffer when the current 
length of the contents of the signal buffer is greater than a predetermined 
maximum buffer size. 

27. The method of claim 24 wherein the predetermined maximum 
buffer size is optimized to compensate for clock drift between an encoder and a 
decoder. 

28. The method of claim 22 wherein modification of the playback of the 
audio signal is in the packet loss concealment mode comprises: 

computing an average energy for a frame in the signal buffer immediately 
preceding the current packet that has not yet been received; 

computing an average energy for a frame in the signal buffer immediately 
succeeding the current packet that has not yet been received; and 

determining a target frame size for both the preceding and succeeding 
frames as a function of the ratio of the of the average energy of the succeeding 
frame to the preceding frame. 

29. The method of claim 28 wherein determining a target frame size for 
both the preceding and succeeding frames further comprises stretching the 
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succeeding frame and the preceding frames by an amount that is inversely 
proportional to the ratio of the average energy. 

30. The method of claim 29 wherein instantiating the mute mode 
5 comprises generating and providing playback of a comfort noise signal to replace 
lost packets, said comfort noise signal being generated from at least one signal 
frame stored in a silence buffer, said signal frame having been determined to 
represent nominal background noise. 

io 31 . The method of claim 30 further comprising periodically replacing 

the signal frames in the silence buffer as a function of a computed energy of 
those frames. 

32. The method of claim 30 wherein generating the comfort noise 
signal from the at least one signal frame stored in a silence buffer comprises: 

automatically computing the FFT of the at least one signal frame stored in 
the silence buffer; 

introducing a random rotation of the phase into the FFT coefficients; 
computing the inverse FFT for each segment, thereby creating the at least 
one synthetic silence segment; and 

providing the at least one silence segment for playback as the comfort 
noise signal. 

33. A computer-readable medium having computer executable 

25 instructions for providing adaptive decoding and playback of a packetized audio 
signal, said computer executable instructions comprising: 

receiving a plurality of network packets, said network packets representing 
a packetized audio signal; 

decoding each network packet as it is received and storing the decoded 
30 packet as a signal frame in a signal buffer; 
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estimating an LPC filter for each signal frame, computing an LPC residual 
from each signal frame using the estimated LPC filter, and storing each LPC 
residual in an LPC residual buffer; 

examining a current length of the LPC residual buffer; 

stretching and outputting a current LPC residual from the LPC residual 
buffer when the current length of the LPC residual buffer is less than a 
predetermined minimum buffer size; and 

computing an inverse LPC of the stretched LPC residual, and outputting 
the result as a current signal frame. 

34. The computer-readable medium of claim 33 wherein the 
predetermined minimum buffer size is optimized to compensate for clock drift 
between an encoder and a decoder. 

35. The computer-readable medium of claim 33 further comprising: 
compressing and outputting a current LPC residual from the LPC residual 

buffer when the current length of the LPC residual buffer is greater than a 
predetermined maximum buffer size; and 

computing an inverse LPC of the compressed LPC residual, and 
outputting the result as a current signal frame. 

36. The computer-readable medium of claim 35 wherein the 
predetermined maximum buffer size is optimized to compensate for clock drift 
between an encoder and a decoder. 

37. The computer-readable medium of claim 33 further comprising 
instantiating a mute mode whereby a playback of the audio signal is at least 
partially muted in the case where a maximum delay time for receiving a current 
packet has been exceeded, and the current packet has not been received. 
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38. The computer-readable medium of claim 33 further comprising 
instantiating a packet loss concealment mode whereby a playback of the audio 
signal is modified for reducing audible artifacts resulting from one or more lost 
packets in the case where a current LPC residual buffer content has been 
previously stretched, a current packet has not yet been received, and a packet 
subsequent to the current packet has already been received. 

39. A method for providing adaptive signal playback, comprising using 
a computing device to: 

receive signal packets representing a digitized audio signal transmitted 
across a packet-based network; 

decode the packets to reconstruct the digitized audio signal; 

store the reconstructed digitized audio signal in a signal buffer; 

provide content of the signal buffer for playback as required by a playback 
device; 

begin stretching contents of the signal buffer when an expected signal 
packet has not been received at an expected time; and 

continue stretching contents of the signal buffer until a condition selected 
from (1) actual receipt of the expected signal packet, and (2) a determination that 
the expected signal packet is lost. 

40. The method of claim 39 wherein the determination that the 
expected signal packet is lost is a function of the amount of stretching already 
applied to the contents of the signal buffer, receipt of one or more subsequent 
expected signal packets, and existing content of the signal buffer. 

41 . The method of claim 39 further comprising muting playback of the 
audio signal when a predetermined delay time has been exceeded without 
receiving any signal packets. 
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42. The method of claim 39 further comprising stretching contents of 
the signal buffer when the length of the contents in the signal buffer is less than a 
predetermined threshold. 

5 43. The method of claim 42 wherein the predetermined threshold is 

optimized to compensate for clock drift between an encoder and a decoder. 

44. The method of claim 39 further comprising compressing contents of 
the signal buffer when the length of the contents in the signal buffer exceeds a 

10 predetermined threshold. 

45. The method of claim 44 wherein the predetermined threshold is 
optimized to compensate for clock drift between an encoder and a decoder. 

15 46. The method of claim 39 further comprising removing content from 

the signal buffer as it is provided for playback as required by a playback device. 

47. The method of claim 39 further comprises analyzing contents of the 
signal buffer to determine a content type of at least part of the contents of the 

20 signal buffer. 

48. The method of claim 39 wherein the content type is quasi-periodic, 
and wherein stretching contents of the signal buffer comprises: 

identifying at least one of the segment of the voiced frame as a template; 
25 searching for a matching segment in adjacent frames whose cross 

correlation peak exceeds a predetermined threshold; and 

aligning and merging the matching segments of the frame. 

49. The method of claim 39 wherein the content type is aperiodic, and 
30 wherein stretching contents of the signal buffer comprises: 
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computing at least one FFT from at least one part of the contents of the 
signal buffer; 

randomizing a phase rotation of the coefficients of at least one of the 
computed FFTs; 

computing an inverse FFT from the coefficients for each FFT to synthesize 
a signal segment corresponding to each computed FFT; and 

stretch at least part of the contents of the signal buffer by inserting each 
synthesized signal segment into the buffered audio signal. 

50. The method of claim 49 further comprising: 
applying an estimated LPC filter to the contents of the signal buffer to 
compute an LPC residual for use in place of the contents of the signal buffer for 
computing the at least one FFT from at least one part of the contents of the 
signal buffer; and 

applying an interpolated inverse LPC filter to the signal segment 
corresponding to each computed FFT prior to stretching at least part of the 
contents of the signal buffer by inserting each synthesized signal segment into 
the buffered audio signal. 



-59- 



